. C L ] 2 1 Ja n 20 09 Approaching the linguistic complexity

نویسنده

Adam Orczyk

چکیده

We analyze the rank-frequency distributions of words in selected English and Polish texts. We compare scaling properties of these distributions in both languages. We also study a few small corpora of Polish literary texts and find that for a corpus consisting of texts written by different authors the basic scaling regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scaling regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, based on the British National Corpus, we consider the rank-frequency distributions of the grammatically basic forms of words (lemmas) tagged with their proper part of speech. We find that these distributions do not scale if each part of speech is analyzed separately. The only part of speech that independently develops a trace of scaling is verbs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

. C L ] 2 1 Ja n 20 09 Approaching the linguistic complexity

نویسنده

چکیده

منابع مشابه

1 2 Ja n 20 09 Rational Points on X split ( p ) and X split ( p 2 ) Yuri

Ja n 20 09 There are no C 1 - stable intersections of regular Cantor sets

ar X iv : 1 20 1 . 52 98 v 1 [ cs . C C ] 2 5 Ja n 20 12 Scrabble is PSPACE - Complete

1 1 Ja n 20 09 Trace coordinates on Fricke spaces of some simple hyperbolic surfaces

6 Ja n 20 09 Atom chips and one - dimensional Bose gases

عنوان ژورنال:

اشتراک گذاری